Bayesian Optimization using Student-t Processes

نویسندگان

  • Amar Shah
  • Andrew Gordon Wilson
  • Zoubin Ghahramani
چکیده

Finding the global minimum of a function is often difficult. We consider efficiently minimizing functions which are computationally expensive to evaluate. A Bayesian approach to the global function optimization problem places a prior distribution on the function and chooses where to evaluate the function based on its posterior distribution given a set of observations. While many recent applications use Gaussian processes as a prior for the objective function, here we show that a Student-t process is an ideal prior for such a problem, as it is also nonparametric, but naturally models heavy tailed behaviour and has a predictive covariance which explicitly depends on observations. 1 The Student-t Process We begin by deriving the Student-t process1, and its marginal likelihood and predictive distribution, starting from a hierarchical Gaussian process model. We define a prior over continous functions using the following generative model r−1 ∼ Γ (ν 2 , ρ 2 ) y|r ∼ GP ( 0, r(ν − 2)k/ρ ) , (1) where ν > 2, ρ > 0 and k : R × R → R is a kernel function. If we marginalize over r, y is a scaled mixture of Gaussian processes. Suppose y = (y1, ..., yN ) is a finite collection of observations at input points x1, ...,xN ∈ R and let K be the Gram matrix such that Kij = k(xi,xj). We can compute the marginal probability of these observations under the generative prior above as follows p(y) = ∫ p(y|r)p(r)dr = (π(ν − 2)) −N/2(ρ/2) ν+N 2 |K|1/2Γ(ν/2) ∫ r−(ν+N)/2−1 exp ( − ρ 2r ( 1 + y>K−1y ν − 2 )) dr = (π(ν − 2))−N/2 Γ((ν +N)/2) Γ(ν/2) |K|−1/2 ( 1 + y>K−1y ν − 2 )−(ν+N)/2 . (2) The Student-t process [O’Hagan, 1991, O’Hagan et al., 1999] has been used in a number of applications [Yu et al., 2007, Zhang and Yeung, 2010, Xu et al., 2011]. Our parameterization differs slightly from previous constructions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hypervolume-based Multi-objective Bayesian Optimization with Student-t Processes

Student-t processes have recently been proposed as an appealing alternative nonparameteric function prior. They feature enhanced flexibility and predictive variance. In this work the use of Student-t processes are explored for multiobjective Bayesian optimization. In particular, an analytical expression for the hypervolume-based probability of improvement is developed for independent Student-t ...

متن کامل

Student-t Processes as Alternatives to Gaussian Processes

We investigate the Student-t process as an alternative to the Gaussian process as a nonparametric prior over functions. We derive closed form expressions for the marginal likelihood and predictive distribution of a Student-t process, by integrating away an inverse Wishart process prior over the covariance kernel of a Gaussian process model. We show surprising equivalences between different hier...

متن کامل

Filtering Outliers in Bayesian Optimization

Jarno Vanhatalo, Pasi Jylänki, and Aki Vehtari. Gaussian process regression with Student-t likelihood. In NIPS, pages 1910–1918, 2009. Amar Shah, Andrew Gordon Wilson, and Zoubin Ghahramani. Student-t processes as alternatives to Gaussian processes. In AISTATS, pages 877–885, 2014. Anthony O'Hagan. On outlier rejection phenomena in Bayes inference. Journal of the Royal Statistical Society. Seri...

متن کامل

Robust Bayesian Optimization with Student-t Likelihood

Bayesian optimization has recently attracted the attention of the automatic machine learning community for its excellent results in hyperparameter tuning. BO is characterized by the sample efficiency with which it can optimize expensive black-box functions. The efficiency is achieved in a similar fashion to the learning to learn methods: surrogate models (typically in the form of Gaussian proce...

متن کامل

Bayesian Student-t Stochastic Volatility Models via a Two-stage Scale Mixtures Representation

In this paper, we provide a statistical analysis of the Stochastic Volatility (SV) models using full Bayesian approach. Student-t distribution is chosen as an alternative to the normal distribution for modelling white noise. Bayesian computation of the SV models completely relies on the Markov chain Monte Carlo methods. In particular, to speed up the efficiency of the Gibbs sampling scheme, we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013